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Abstract. An overview of tools and methods for the reconstruction of high-boost top quark 
decays at the LHC is given in this report. The focus is on hadronic decays, in particular an 
overview of the current status of top quark taggers in physics analyses is presented. The most 
widely used jet substructure techniques, normally used in combination with top quark taggers, 
are reviewed. Special techniques to treat pileup in large cone jets are described, along with a 
comparison of the performance of several boosted top quark reconstruction techniques. 


1. Strategy 

The strategy to address the reconstruction of boosted top quarks can be subdivided in two 
different categories depending on the decay of the top quark. For high-p-p top quarks decaying 
leptonically, the main issue is the small angular separation between the the lepton and the 6-jet. 
Classic lepton isolation algorithm might then reject boosted leptonic top quarks. A solution to 
this problem consists in shrinking the isolation cone around the lepton in function of its px [3] . 
For boosted hadronic top quark decays, the main topic of this document, the hadronic decay 
products might be too close together for standard jet algorithms and sizes to be reconstructed 
separately. The strategy for the reconstruction of boosted hadronic top quarks can then be 
outlined as: 

(i) use a large-cone jet (R > 0.8) to cluster the whole top quark decay; 

(ii) apply an algorithm that looks inside the large-cone jet to try and recover the decay products 
of the top quark (top tagger); 

(iii) use jet substructure variables to discriminate between real top decays and other processes. 
1.1. Jet grooming techniques 

The starting point of top quark taggers is the clustering of jets with a large radius (> 0.8), 
however these jets tend to collect a lot of soft QCD radiation. Dedicated algorithms, called 
“jet groomers” are needed to resolve the hard decay products removing soft and wide angle 
radiation. 

One of these algorithms is called “jet trimming” j3J, and consist in the following steps: 

(i) inside the large-cone jet, cluster subjets of radius R su b using the kt algorithm 

(ii) reject the soft subjets that do not satisfy the condition py’(subjet)/pT'(jet)> /. 

The optimal values of the parameters of the algorithm (R su b and /) are determined through 
Monte Carlo simulations. Dedicated studies using full-detector simulation |5] show an 
improvement in performance of the reconstructed jet mass variable. 


2. Top quark tagging algorithms 

This section contains a summary description of the most widely used algorithms by the CMS 
[1J and ATLAS [2] experiments at the LHC. 

2.1. The CMSTopTagger algorithm 

The CMSTopTagger algorithm uses a Cambridge-Aachen jet with a radius of 0.8 as a starting 
point. A detailed description of the algorithm can be found in Ref. [fj]. The CMSTopTagger has 
been commissioned by the CMS Collaboration [7]. 

2.2. The HEPTopTagger algorithm 

The HEPTopTagger algorithm uses a Cambridge-Aachen jet with a radius of 1.5 as a starting 
point. Given the larger radius of the starting jet, this algorithm is able to cluster top decays with 
a smaller boost with respect to the other algorithms. A detailed description of the algorithm 
can be found in Ref. | 8 ]. The HEPTopTagger has been commissioned by both the ATLAS and 
CMS Collaborations [9], [7]. 

The tagger’s efficiency drops at very high px because the initial clustering radius is too 
large and collects too many particles from the underlying event or pileup. Thus a series 
of improvements have been developed leading to the Multi-R HEPTopTagger. Namely, the 
algorithm now tests different initial clustering radii and finds the optimal one. The optimal 
radius in function of jet px can be used as a tagging variable. 

2.3. Calibration and data-driven corrections 

The usage of top tagging algorithms from large-cone jets poses a problem in the calibration of 
such tools for usage in real data. 

The strategy adopted by the CMS Collaboration consists in deriving data-simulation 
correction factors computed using a “tag and probe” method in a semileptonic tt enriched 
selection. The idea is to “tag” an event by identifying a leptonic top, then probe for the tagger’s 
efficiency on the hadronic side of the tt event. The comparison of efficiencies derived in data 
and simulation gives the correction factors, measured in function of 77 , px, and Monte Carlo 
generators [7]. 

The ATLAS Collaboration uses a different approach: for subjets of the HEPTopTagger a 
dedicated calibration is performed for different R subjets. For large-cone jets, the mass scale is 
validated by comparing the ratio of calo-jet mass divided by its correspondent track-jet mass in 
data and simulation |5]. 

2-4- Shower deconstruction tagger 

The starting point of the shower deconstruction tagger is the decomposition of the large-cone jet 
in small jets with radii between 0.1 and 0.3 (microjets). A detailed description of the algorithm 
can be found in Ref. m- Comparison of data with Monte Carlo simulations shows a reliable 
behavior of the discriminant and the kinematic variables related to the microjets HU- 

3. Further jet substructure techniques 

3.1. B-tagging in boosted topologies 

R-jet tagging is an essential tool in many physics analyses to distinguish the interesting signal 
containing top quarks from the multijet background. 

R-tagging algorithms use information from jet tracks (especially the impact parameter of the 
track), and secondary vertices. The variables are then combined in a discriminator obtained 
from a neural network (MV 1 tagger used by the ATLAS Collaboration) or a likelihood function 
(combined secondary vertex, CSV used by the CMS Collaboration). 


In the boosted regime, the decay products of the hard scatter process have a small angular 
separation, in addition to this there is a higher probability of contamination with tracks from 
light flavor jets. These factors lead to a degraded performance of 6-tagging algorithms. 

The ATLAS Collaboration developed a series of improvement to the current 6-tagging 
algorithm m, m • The first of these improvements regards the addition of new variables 
with more discrimination power between real 6-jets and misidentified 6-jets. The neural network 
used to derive the final discriminator variable is then retrained using samples enriched in high-p^ 
6-jets. Considering a boosted topology, for a given 6-tagging efficiency, the new algorithm yields 
a light flavor rejection rate that is, on average, twice that of the old algorithm. 

The second point of improvement in the ATLAS 6-tagging framework, is the usage of different 
input jets collections to the algorithm. Jets with a smaller cone are able to resolve better the 
boosted decay products, in addition to this, track jets provide a better jet direction resolution. 

The CMS Collaboration uses two different approaches for 6-tagging in boosted topologies 
|12j . The first approach consist in running the CSV algorithm on the large-cone jets used to 
cluster the decay product of the hadronic top quark. 

The second approach consist in running the CSV algorithm directly on the subjets found by a 
top quark tagging algorithm. The second approach represent a more natural way to address the 
issue of 6-tagging in top jets, as only one of the subjets is expected to come from the decay of a b 
quark. A performance assessment of the two approaches confirms that, for top jet identification, 
the subjet 6-tagging approach is the most performing both in medium and very high boost 
scenarios. Other improvements to the CSV algorithm have been implemented in the 6-tagging 
framework [13] : among them there is the usage of the inclusive vertex finder (IVF) to search for 
secondary vertices. 


3.2. Other variables 

A number of other jet substructure variables provide discrimination power between jets coming 
from high-pT top quark decays and other processes. 

The N-subjettiness (tjv) [16], used by the CMS and ATLAS Collaborations, aims at describing 
how well a jet of radius Rj e t can be described as containing N or fewer kt subjets. The variable 
is defined as: 


tn = -j- Y]pT,fcmin{Ai?i k , A R 2)k , • • • , AR N)k } 

do k 

where k runs over all the jet constituents and do = Y2kPT,kRjet■ The algorithm computes the 
Pt weighted average of minimum Ai?(jet constituent, subjet axis )/Rj e t- For boosted hadronic 
top quarks the variable ratios 73 /t 2 and T 2 /T 1 are relevant. 

The kt splitting scale variable {^/dij), used by the ATLAS Collaboration, describes how 
likely it is for a jet to be composed by a two or three prongs decay. The algorithm is describe 
in Ref. [5]. 

Both of these algorithms have been commissioned on data collected by the LHC at a center 
of mass energy of 8 TeV 0, 0. The variables have been used in a number of physics analyses. 


3.3. The semi-resolved case and W-tagging 

In medium boost regimes (pr( jet)« m top ) angular separation between the decay products of a 
top quark might be big enough that a large-cone jet is not able to cluster the whole decay. In 
these cases it is possible to cluster the 6-jet and the hadronic decay products of the W boson in 
two separate jets. For the 6-jet reconstruction, standard size jets (R = 0.4 — 0.5) and standard 
6-tagging techniques can be used. The hadronic W boson decay can then be clustered separately 
in a large-cone jet (W-jet). 

To distinguish jets originated from boosted W boson decay from other processes a class of 
algorithms called W-taggers is used. W-jets are a benchmark topology for jet substructure 


studies, so a large amount of techniques were developed in the past few years m , n m, eei- 
One of the most used techniques used to identify large cone jets coming from W bosons consists 
in the following steps: 

(i) use a jet grooming algorithm; 

(ii) compute the jet mass from the groomed jet and apply a cut on the groomed jet mass; 

(iii) eventually use a jet substructure variable to select two-prong decays (e.g. N-subjettiness). 

3-4- Pileup mitigation techniques for Run 2 

Pileup represents a major issue in the reconstruction and energy measurement of large-cone 
jets, commonly used for top quark tagging. Jet substructure variables built starting from the jet 
components are also affected by the same problem. Many standard techniques to treat pileup 
contamination exist already. However, dedicated techniques are being developed to address 
pileup in the forthcoming data taking period at the LHC. One of these techniques is pileup per 
particle identification (PUPPI) [18] . 

This new approach consist in assigning a weight to each reconstructed object in the detector 
(by scaling its momentum) based on pileup event properties and tracking information. The 
novelty of the approach consist in acting directly on the input component of the jet clustering 
algorithm; the immediate consequence of this fact is that all the jet substructure variables 
computed using the jet components as input are automatically corrected by the procedure nu. 

4. Algorithm comparison 

The various top tagging algorithms and jet substructure variables can be compared by computing 
performance curves (also called ROC curves sometimes), plotting the top misidentification 
probability in function of the tagger’s efficiency. A continuous line can be obtained for 
each tagger by varying the parameters of the algorithm. Different tagging algorithms and 
jet substructure variables can be combined together to obtain a better discrimination power. 
Performance comparison is shown for ATLAS and CMS in Fig. [lj 

5. Conclusions 

Boosted top quark tagging and, more in general, jet substructure techniques are a very active 
held of research with many new theoretical and experimental development presented every year. 
These tools are now widely used in searches for physics beyond the standard model with top 
quarks in the final state. Such techniques will be even more relevant during Run 2 of the LHC. 

Studies are ongoing to assess the performance of the taggers under the pileup scenarios 
that will be found during the next run of the LHC and to better understand the systematic 
uncertainties associated to the use of top quark tagging in physics analyses. 
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Figure 1. Top row: top tagging performance in ATLAS using trimmed anti -kf R=1.0 jets and 
Cambridge-Aachen R=1.2 jets (for the HEPTopTagger), px( jet)> 550 GeV [IT] . Bottom row: 
top tagging performance in CMS using R=1.5, ^(matched parton)> 200 GeV jets (left) and 
R=0.8, pr (matched parton)> 600 GeV jets (right) [?]. 
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